Search CORE

3 research outputs found

Statistical Analysis of Spherical Data: Clustering, Feature Selection and Applications

Author: Amayri Ola
Publication venue
Publication date: 28/09/2014
Field of study

In the light of interdisciplinary applications, data to be studied and analyzed have witnessed a growth in volume and change in their intrinsic structure and type. In other words, in practice the diversity of resources generating objects have imposed several challenges for decision maker to determine informative data in terms of time, model capability, scalability and knowledge discovery. Thus, it is highly desirable to be able to extract patterns of interest that support the decision of data management. Clustering, among other machine learning approaches, is an important data engineering technique that empowers the automatic discovery of similar object’s clusters and the consequent assignment of new unseen objects to appropriate clusters. In this context, the majority of current research does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications. Thus, in this thesis we propose several estimation and feature selection frameworks based on Langevin distribution which are devoted to spherical patterns in offline and online settings. In this thesis, we first formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from finite Langevin mixture for Support Vector Machine. We are motivated by the fact that the blending of generative and discriminative approaches has prevailed by exploring and adopting distinct characteristic of each approach toward constructing a complementary system combining the best of both. Due to the high demand to construct compact and accurate statistical models that are automatically adjustable to dynamic changes, next in this thesis, we propose probabilistic frameworks for high-dimensional spherical data modeling based on finite Langevin mixtures that allow simultaneous clustering and feature selection in offline and online settings. To this end, we adopted finite mixture models which have long been heavily relied on deterministic learning approaches such as maximum likelihood estimation. Despite their successful utilization in wide spectrum of areas, these approaches have several drawbacks as we will discuss in this thesis. An alternative approach is the adoption of Bayesian inference that naturally addresses data uncertainty while ensuring good generalization. To address this issue, we also propose a Bayesian approach for finite Langevin mixture model estimation and selection. When data change dynamically and grow drastically, finite mixture is not always a feasible solution. In contrast with previous approaches, which suppose an unknown finite number of mixture components, we finally propose a nonparametric Bayesian approach which assumes an infinite number of components. We further enhance our model by simultaneously detecting informative features in the process of clustering. Through extensive empirical experiments, we demonstrate the merits of the proposed learning frameworks on diverse high dimensional datasets and challenging real-world applications

Concordia University Research Repository

On email spam filtering using support vector machine

Author: Amayri Ola
Publication venue
Publication date: 01/01/2009
Field of study

Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as "spam emails". A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. We investigate the use of several distance-based kernels to specify spam filtering behaviors using SVM. However, most of used kernels concern continuous data, and neglect the structure of the text. In contrast to classical blind kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variant in TC that yields improved performance for the standard SVM in filtering task. Furthermore, we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates

CiteSeerX

Concordia University Research Repository

A Bayesian analysis of spherical pattern based on finite Langevin mixture

Author: Agarwal
Allan
Allan
Allan
Amayri
Amayri
Amayri
Amayri
Banerjee
Banerjee
Banerjee
Bouguila
Bouguila
Brown
Cai
Cai
Chang
Chib
Coles
Damien
Dempster
Dhanalakshmi
Everingham
Guttorp
Górriz
He
Heitz
Hornik
Jupp
Kass
Kleinberg
Lewis
Lowe
Maji
Mardia
Mardia
Mardia
Marin
McLachlan
Mel
Mitra
Morris
Nizar Bouguila
Nunez-Antonio
Ola Amayri
Perronnin
Raftery
Robert
Robert
Robert
Rukhin
Sakurai
Salton
Stephens
Stephens
Stephens
Vellasques
Watson
Watson
Watson
Zheng
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref